Use of a Full Parser for Information Extraction in Molecular Biology Domain
نویسندگان
چکیده
There is an increasing need for automatic information extraction (IE) to support database building and to intelligently find novel knowledge of biological events from online journal collections. Many of the previous researchers (e.g., [3]) extracted such information by using hand-tailored patterns in regular expressions on some pre-defined set of verbs representing a certain type of reaction. However, as a fact can be represented in various forms in natural language text, many patterns of surface expressions need to be prepared for one event. We propose an alternative information extraction method based on full parsing with a large-scale, general-purpose grammar. In our system, a parser converts the variety of sentences that describe the same event into a canonical structure (argument structure) regarding the verb representing the event and its arguments such as (semantic) subject and object. Information extraction itself is done using pattern matching on the canonical structure. Since the variation of representation is absorbed by the parser, a relatively small number of patterns are required for extracting an event. In the current work, we have designed and implemented an argument extractor using a full parser to investigate the plausibility of full analysis of text using general-purpose parser and grammar applied to biomedical domain. We introduce two preprocessors to solve the problem of full parsers. One is a term recognizer (e.g., [1]) that glues the words in a noun phrase into one chunk so that the parser can handle them as if it is one word. The other is a shallow parser that reduces the lexical ambiguity. Thus, we partially solve the problems of full parsing of inefficiency and ambiguity We also propose the use of modules that handles partial results of parsing for overcoming the low coverage problem.
منابع مشابه
برچسبزنی خودکار نقشهای معنایی در جملات فارسی به کمک درختهای وابستگی
Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملEvent Extraction from Biomedical Papers Using a Full Parser
We have designed and implemented an information extraction system using a full parser to investigate the plausibility of full analysis of text using general-purpose parser and grammar applied to biomedical domain. We partially solved the problems of full parsing of inefficiency, ambiguity, and low coverage by introducing the preprocessors, and proposed the use of modules that handles partial re...
متن کاملEffects of T208E activating mutation on MARK2 protein structure and dynamics: Modeling and simulation
Microtubule Affinity-Regulating Kinase 2 (MARK2) protein has a substantial role in regulation of vital cellular processes like induction of polarity, regulation of cell junctions, cytoskeleton structure and cell differentiation. The abnormal function of this protein has been associated with a number of pathological conditions like Alzheimer disease, autism, several carcinomas and development of...
متن کاملRNA Extraction from Animal and Human\'s Cancerous Tissues: Does Tissue Matter?
The reliability of gene expression profiling based technologies and methods to find transcriptional differences representative for the original samples is influenced by the quality of the extracted RNA. Hence, RNA extraction is the first step to investigate the gene expression and its function. Consequently, the quality of extracted RNA is really significant. Correspondingly, this research was ...
متن کامل